UKON-Fischer-MC1

VAST 2012 Challenge
Mini-Challenge 1: Bank of Money Enterprise: Cyber Situation Awareness

 

 

Team Members:

 

Fabian Fischer, University of Konstanz, Fabian.Fischer@uni-konstanz.de (Primary)

Johannes Fuchs, University of Konstanz, fuchs@dbvis.inf.uni-konstanz.de

Florian Mansmann, University of Konstanz, Florian.Mansmann@uni-konstanz.de

Daniel A. Keim, University of Konstanz, Daniel.Keim@uni-konstanz.de

 

Student Team:  No

 

Tool(s):

 

BANKSAFE, Java Web Application developed by us for the VAST Challenge.

Apache Tomcat, Server Infrastructure.

Vaadin, Java Web Framework.

Google BigQuery, Scalable Backend Database.

Memcached, High-performance, distributed memory object caching system.

Ehcache, Java-based persistent cache.

D3.js, JavaScript library for web visualizations.

Java Applets, Some visualizations are traditional Java Applets.

Scripting Languages & R, Preprocessing and chart plotting.

 

Video:

 

 

 

 

Answers to Mini-Challenge 1 Questions:

 

MC 1.1  Create a visualization of the health and policy status of the entire Bank of Money enterprise as of 2 pm BMT (BankWorld Mean Time) on February 2. What areas of concern do you observe? 

 

To get an overview of the health and policy of the entire Bank of Money, we developed a treemap visualization, which is able to show the whole network. The following picture shows the policy status. It is important to distinguish between the different machine classes. This is the reason why we used the following hierarchy in the treemap:

 

machine classes -> businessunits -> facilities -> policystatus 1 to 5 (color green, orange to red)

 

The sizes of the rectangles are mapped to the number of underlying hosts. This reflects the overall impression, that the majority of hosts have a good policy status.

 

However, especially two regions (region-5 and region-10) are very suspicious on that point in time, because all of their machines do suffer from moderate policy deviations.

 

 

The visualization can also be changed to visualize the activity flag attribute accordingly.

 

Analysts are not only interested in the general overview, but also in a more detailed view which focuses on suspicious computers. Compared to the high number of green machines, they are almost completely hidden. This is the reason why the analyst can change the display to map the policy status to the rectangle area sizes. This emphasizes highly suspicious machines with critical policy deviations.

 

The following visualizations helps to identify compromised and virus infected machines. All of the deep red colored hosts should definitely be taken care of.

 

Especially noteworthy is one server (highlighted in the following figure) of the headquarters in datacenter-2, which has a policy status of 5. The administrators should take this server offline immediately, because it could spread to the other servers as well. Probably the administrators haven’t realized the problem yet, because over 90% of the servers (> 40.000 hosts) have no problem at all.

 

 

MC 1.2  Use your visualization tools to look at how the network’s status changes over time. Highlight up to five potential anomalies in the network and provide a visualization of each. When did each anomaly begin and end? What might be an explanation of each anomaly?

 

Potential Anomaly #1

 

To provide a compact, but high-density information display, we implemented the following matrix idea.

 

The 5x5 colored matrix represents all possible combinations of policy and activity scores. This represents the number of hosts for an aggregated time-span for a single region. In the example above we highlight the rectangle which contains two hosts having policy status 5 and activity flag 3. This matrix alone is not that insightful, but when we arrange them in a small multiple display, clear patterns become visible. Each row in the following figure represents the time series for a particular region. Each column represents one hour in the dataset. The ordering of the columns is done according to a MDS projection of the geographic coordinates of the respective headquarters. Geographically near regions are plotted near to each other. As a result the following day/night pattern occurs, which is slightly shifted because of the different time zones.

 

The 12th column shows the headquarters. There seems to be 3 to 4 hour longer activity than in most other regions, which is interesting, but probably not particular critical. Again region-5 and region-10 stick out, because their hosts have many hosts with policy status 2.

 

 

Potential Anomaly #2

 

Another interesting anomaly is the fact, that there is a continuous shift to a higher policy level. This means that the hosts within the regions get more and more infected as time goes by. This can be revealed by following the rectangles and using the tooltips.

 

 

This effect can directly been seen, when using a line chart visualization. The following figure shows a clear downward trend of hosts with policy status 2 and a clear trend in the number of infected machines reaching a steep curve during the end of the time period. The reason for this is probably an unpatched security hole, which is exploited by a virus spreading in the network.

 

 

Similar to the very first figure in this submission the following figure shows what the administrator sees at a later stage of the network infection (many yellow and red machines). This really demonstrates the situational awareness which can be gained from this overview visualization for a single point in time. Interactive exploration and drill-down functionality is provided to fully explore the dataset.

 

 

 

There are many other anomalies and interesting patterns in the data, however most regions show quite similar overall patterns and trends. The administrators should definitely focus on patching the machines immediately and analyze the log files to reconstruct the malicious behavior to investigate a possible data breach.

 

Deployment of BANKSAFE

 

The developed BANKSAFE framework is a web application which makes use of scalable distributed database technologies and high-performance caching to provide situational awareness for large-scale networks.